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Abstract 

We combine the adaptive and multilevel approaches to the BDDC 

£N| and formulate a method which allows an adaptive selection of constraints 

on each decomposition level. We also present a strategy for the solu- 

' j ' tion of local eigenvalue problems in the adaptive algorithm using the 

LOBPCG method with a preconditioner based on standard components 

j /^ of the BDDC. The effectiveness of the method is illustrated on several 

engineering problems. It appears that the Adaptive-Multilevel BDDC 
|i algorithm is able to effectively detect troublesome parts on each decom- 

position level and improve convergence of the method. The developed 
open-source parallel implementation shows a good scalability as well as 

i i applicability to very large problems and core counts. 
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1 Introduction 
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o 

The Balancing Domain Decomposition by Constraints (BDDC) was developed 
by Dohrmann [7] as a primal alternative to the Finite Element Tearing and 
(f~) Interconnecting - Dual, Primal (FETI-DP) by Farhat et al. [8]. Both methods 

i~ I use constraints to impose equality of new 'coarse' variables on substructure in- 

terfaces, such as values at substructure corners or weighted averages over edges 
• *~j and faces. Primal variants of the FETI-DP were also independently proposed by 

rS Cros [5] and by Fragakis and Papadrakakis [9]. It has been shown in [24, 38] that 

these methods are in fact the same as BDDC. Polylogarithmic condition number 
bounds for FETI-DP were first proved in [28] and generalized to the case of coef- 
ficient jumps between substructures in [15]. The same bounds were obtained for 
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BDDC in [20, 21]. A proof that the eigenvalues of the preconditioned operators 
of both methods are actually the same except for the eigenvalues equal to one 
was given in [21] and then simplified in [3, 19, 24]. FETI-DP, and, equivalently, 
BDDC are quite robust. It can be proved that the condition number remains 
bounded even for large classes of subdomains with rough interfaces in 2D [13, 44] 
as well as in many cases of strong discontinuities of coefficients, including some 
configurations when the discontinuities cross substructure boundaries [29, 30]. 
However, the condition number does deteriorate in many situations of practical 
importance and an adaptive method is warranted. 

Adaptive enrichment for BDDC and FETI-DP was proposed in [22, 23], with 
the added coarse functions built from eigenproblems based on adjacent pairs of 
substructures in 2D formulated in terms of FETI-DP operators. The algorithm 
has been developed directly in terms of BDDC operators and extended to 3D 
by Mandel, Sousedfk and Sfstek [27, 35], resulting in a much simplified for- 
mulation and implementation with global matrices, no explicit coarse problem, 
and getting much of its parallelism through the direct solver used to solve an 
auxiliary decoupled system. The only requirement for all these versions of the 
adaptive algorithms is that there is a sufficient number of corner constraints to 
prevent rigid body motions between any pair of adjacent substructures. This 
requirement has been recognized in other contexts [4, 18], and in the context of 
BDDC by Dohrmann [7], and recently by Sfstek et al. [33]. 

Moreover, solving the coarse problem exactly in the original BDDC method 
becomes a bottleneck as the number of unknowns and, in particular, the number 
of substructures gets too large. Since the coarse problem in BDDC, unlike in the 
FETI-DP, has the same structure as the original problem, it is straightforward to 
apply the method recursively to solve the coarse problem only approximately [7] . 
The original, two- level, BDDC has been extended into three-levels by Tu [41, 42] 
and into a general multilevel method by Mandel, Sousedfk and Dohrmann [25, 
26]. Recently the BDDC has been extended into three-level methods for mortar 
discretizations [12], and into multiple levels for saddle point problems [37, 43]. 
The abstract condition number bounds deteriorate exponentially with increasing 
number of levels. 

Here we combine the adaptive and multilevel approaches to the BDDC 
method in order to develop its variant that would preserve parallel scalability 
with an increasing number of subdomains and also show its excellent conver- 
gence properties. The adaptive method works as previously. It selects con- 
straints associated with substructure faces, obtained from solution of local gen- 
eralized eigenvalue problems for pairs of adjacent substructures, however this 
time on each decomposition level. Because of the multilevel approach, the coarse 
problems are treated explicitly (unlike in [27, 35]). The numerical examples show 
that the heuristic eigenvalue-based estimates work reasonably well and that the 
adaptive approach can result on each decomposition level in the concentration 
of computational work in a small troublesome parts of the problem, which leads 
to a good convergence behavior. The developed open-source parallel implemen- 
tation shows a good scalability as well as applicability to very large problems 
and core counts. 
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The theoretical part of this paper presents a part of the work from the the- 
sis [36] in a shorter, self-contained way. Also some results by the serial imple- 
mentation of the algorithm from [36] are reproduced here for comparisons. The 
two-dimensional version of the algorithm was described in conference proceed- 
ings [39] . The main original contribution of this paper is the description of the 
parallel implementation of the method, and numerical study of its performance. 

The paper is organized as follows. In Section 2 we establish the notation 
and introduce problem settings and preliminaries. In Section 3 we recall the 
Multilevel BDDC originally introduced in [26]. In Section 4, we describe the 
adaptive two-level method in terms of the BDDC operators with an explicit 
coarse space. In Section 5 we discuss a preconditioner for LOBPCG used in the 
solution of the local generalized eigenvalue problems in the adaptive method. 
Section 6 contains an algorithm for the adaptive selection of components of the 
Multilevel BDDC preconditioner. Numerical results are presented in Section 8, 
and Section 9 contains summary and concluding remarks. 

2 Notation and substructuring components 

We first establish notation and briefly review standard substructuring concepts 
and describe BDDC components. See, e.g., [17, 34, 40] for more details about 
iterative substructuring in general, and in particular [7, 20, 24, 26] for the 
BDDC. Consider a bounded domain O C I 3 discretized by conforming finite 
elements. The domain 51 is decomposed into N nonoverlapping subdomains fP, 
i = 1, ... TV, also called substructures, so that each substructure Vt 1 is a union of 
finite elements. Each node is associated with one degree of freedom in the scalar 
case, and with 3 displacement degrees of freedom in the case of linear elasticity. 
The nodes contained in the intersection of at least two substructures are called 
boundary nodes. The union of all boundary nodes of all substructures is called 
the interface, denoted by T, and Y l is the interface of substructure fi l . The 
interface F may also be classified as the union of three different types of sets: 
faces, edges and corners. We will adopt here a simple (geometric) definition: a 
face contains all nodes shared by the same two subdomains, an edge contains 
all nodes shared by same set of more than two subdomains, and a corner is a 
degenerate edge with only one node; for a more general definition see, e.g., [14]. 

We identify finite clement functions with the vectors of their coefficients in 
the standard finite element basis. These coefficients are also called variables 
or degrees of freedom. We also identify linear operators with their matrices, in 
bases that will be clear from the context. 

Here, we find it more convenient to use the notation of abstract linear spaces 
and linear operators between them instead of the space K™ and matrices. The 
results can be easily converted to matrix language by choosing a finite element 
basis. The space of the finite element functions on f2 will be denoted as U. Let 
W s be the space of finite element functions on substructure il s , such that all of 
their degrees of freedom on <9£l s n <9f2 are zero. Let 

W ^W 1 x ••• x W N , 
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and consider a bilinear form a (•, •) arising from the second-order elliptic problem 
such as Poisson's equation or a problem of linear elasticity. 

Now U C W is the subspace of all functions from W that are continuous 
across the substructure interfaces. We are interested in the solution of the 
problem 

u G U : a(u,v) = </,«>, Vv G U, (1) 

where the bilinear form a is associated on the space U with the system 
operator A, defined by 

A-.U^U', a(u,v) = {Au,v), Vu,v£U, (2) 

and / G U' is the right-hand side. Hence, (1) is equivalent to 

An = f. (3) 

Define Ui C £/ as the subspace of functions that are zero on the interface T, 
i.e., the 'interior' functions. Denote by P the energy orthogonal projection from 
W onto £//, 

P : w <EW i — > vj £Uj : a (vj, zi) = a (w, z£) , \/zj G f/j. 

Functions from (/ — P) VF, i.e., from the nullspace of P, are called discrete 
harmonic; these functions are a-orthogonal to t/j and energy minimal with 
respect to increments in Ui. Next, let W be the space of all discrete harmonic 
functions that are continuous across substructure boundaries, that is 

W = (I - P)U. (4) 

In particular, 

U = Ui®W, UjLaW. (5) 

The BDDC method [7, 24] is a two-level preconditioner characterized by the 
selection of certain coarse degrees of freedom, such as values at the corners and 
averages over edges or faces of substructures. Define W C W as the subspace 
of all functions such that the values of any coarse degrees of freedom have a 
common value for all relevant substructures and vanish on i9J7, and Wa C W as 
the subspace __of all functions such that their coarse degrees of freedom vanish. 
Next, define Wn as the subspace of all functions such that their coarse degrees 
of freedom between adjacent substructures coincide, and such that their energy 
is minimal. Clearly, functions in Wn are uniquely determined by the values of 
their coarse degrees of freedom, and 

WA-LaWn, and W = W A @W n . (6) 

The component of the BDDC preconditioner formulated iri_the space Wn is 
called the coarse problem and the components in the space Wa are called sub- 
structure corrections. 
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We assume that 

a is positive definite on W. (7) 

This will be the case when a is positive definite on the space U and there are 
sufficiently many coarse degrees of freedom [26]. We further assume that the 
coarse degrees of freedom are zero on all functions from Ui, that is, 

Ui C W A . (8) 

In other words, the coarse degrees of freedom depend on the values on substruc- 
ture boundaries only. From (6) and (8), it follows that the functions in Wn are 
discrete harmonic, that is, 

Wu = (I-P)W u . (9) 

Next, let E be a projection from W onto U, defined by taking some weighted 
average on substructure interfaces. That is, we assume that 

E : W -> U, EU = U, E 2 = E. (10) 

Since a projection is the identity on its range, it follows that E does not change 
the interior degrees of freedom, 

EU T = Ui, (11) 
since Ui C U. Finally, we recall that the operator (J — P) E is a projection [26]. 

3 Multilevel BDDC 

We recall Multilevel BDDC which has been introduced as a particular instance 
of Multispace BDDC in [26]. The substructuring components from Section 2 
will be denoted by an additional subscript i, as fij , s = 1, . . . N\, etc., and called 
level 1. The level 1 coarse problem will be called the level 2 problem. It has the 
same finite element structure as the original problem (1) on level 1, so we put 
U2 = Wui- Level 1 substructures are level 2 elements and level 1 coarse degrees 
of freedom are level 2 degrees of freedom. Repeating this process recursively, 
level i — 1 substructures become level i elements, and the level i substructures 
are agglomerates of level i elements. Level i substructures are denoted by Of, 
s = 1, . . . , Ni, and they are assumed to form a conforming triangulation with 
a characteristic substructure size Hi. For convenience, we denote by fig the 
original finite elements and put Hq = h. The interface I\j on level i is defined as 
the union of all level i boundary nodes, i.e., nodes shared by at least two level i 
substructures, and we note that C r^_i. Level i—1 coarse degrees of freedom 
become level i degrees of freedom. The shape functions on level i are determined 
by minimization of energy with respect to level i — 1 shape functions, subject 
to the value of exactly one level i degree of freedom being one and the other 
level i degrees of freedom being zero. The minimization is done on each level i 
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element (level i — 1 substructure) separately, so the values of level i—1 degrees of 
freedom are in general discontinuous between level i — 1 substructures, and only 
the values of level i degrees of freedom between neighbouring level i elements 
coincide. 

The development of the spaces on level i now parallels the finite element 
setting in Section 2. Denote Ui = Wn,t-i- Let W/ be the space of functions on 
the substructure fif , such that all of their degrees of freedom on <9£!f n dfl are 
zero, and let 

Wi = Wl x •• • x W t N '. 

Then U% C Wi is the subspace of all functions from Wi that are continuous 
across the interfaces I\. Define Uu C Ui as the subspace of functions that are 
zero on I\, i.e., the functions 'interior' to the level i substructures. Denote by 
Pi the energy orthogonal projection from Wi onto Un, 

Pi :w,G Wi i — > Vn eUn : a {v Iu Zn) = a (wi,Zn) , \/zn e Ur- 

Functions from (7 — Pi)Wi, i.e., from the nullspace of Pi, are called discrete 
harmonic on level i; these functions are a-orthogonal to Uu and energy minimal 
with respect to increments in Uu. Denote by Wi C U{ the subspace of discrete 
harmonic functions on level i, that is 

W i = (I-P i )U i . (12) 

In particular, Uu _L a Wi- Define Wi C Wi as the subspace of all functions 
such that each coarse degree of freedom on level i has a common value for all 
relevant level i substructures, and Wai C W as the subspace of all functions 
such that their level i coarse degrees of freedom have zero value. Define Wm 
as the subspace of all functions such that their level i coarse degrees of freedom 
between adjacent substructures coincide, and such that their energy is minimal. 
Clearly, functions in Wm are uniquely determined by the values of their level i 
coarse degrees of freedom, and 

W Al ± a W m , Wi = W Ai ®W m . (13) 

We assume that the level i coarse degrees of freedom are zero on all functions 
from Uu, that is, 

U N C W Al . (14) 

In other words, level i coarse degrees of freedom depend on the values on level % 
substructure boundaries only. From (13) and (14), it follows that the functions 
in Wjii are discrete harmonic on level i, that is 

Wm = (I - Pi) Wm- (15) 

Let E be a projection from Wi onto Ui, defined by taking some weighted average 

on Ti 

Ei : Wi -> U u El = Ei. 
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Since projection is the identity on its range, Ei does not change the level i 
interior degrees of freedom, in particular 

EiU u = U u . (16) 

The Multilevel BDDC method is now defined recursively [7, 26] by solving 
the coarse problem on level i only approximately, by one application of the 
preconditioner on level i + 1. Eventually, at the top level L — 1, the coarse 
problem, which is the level L problem, is solved exactly. A formal description 
of the method is provided by the following algorithm. 

Algorithm 1 (Multilevel BDDC, [26, Algorithm 17]) Define the precon- 
ditioner r\ € U[ i — > u\ G U\ as follows: 
for i = 1,...,L-1, 

Compute interior pre- correction on level i, 

u u eU n :a (u u , z u ) = (n,zn) , Vz« e U u . (17) 

Get an updated residual on level i, 

rBi^U u (r B i,Vi) = (n,Vi) - a(un,Vi) , Vui € Ui. (18) 

Find the substructure correction on level i: 

WAi G WAi ■ a (WAi, ZAi) = (rBi, EiZAi) , Vz A i e WAi- (19) 

Formulate the coarse problem on level i, 

wm G Wm ■ a (wm,zm) = {rBi,EiZm) , Vz m e Wxu- (20) 

If i = L — 1, solve the coarse problem directly and set ul = wxiL—ii 
otherwise set up the right-hand side for level i + 1, 

r.+i £ Wn„ (r i+ i,z i+ i) = (rsiiEiZi+i) , Vz i+ i € Wm = U i+ i, (21) 

end. 

for i = L- 1,...,1, 

Average the approximate corrections on substructure interfaces on level i, 

u B i = E l (w A i + Ui+i) ■ ( 22 ) 
Compute the interior post- correction on level i, 

vu e Uu : a (vu, Zu) = a {u Bl ,z H ) , Vzn € Uu. (23) 
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Apply the combined corrections, 



Ui = uu + u B i - v u . (24) 

end. 

The condition number bound for Multilevel BDDC is given as follows. 

Lemma 1 ([26, Lemma 20]) The condition number n of Multilevel BDDC 
from Algorithm 1 satisfies 

k<lu = Tlfs^Ui , (25) 

where 

^ = sup — ^ -. (26) 

weWi \\ W \\a 

For the purpose of the adaptive selection of constraints, we use the bound 
based on jump jit the interface defined on the subspace of discrete harmonic 
functions from Wi. More precisely, we modify (26) using the identity 

(7 - P^I - Pi) = (I - PJEi (27) 

and the fact that Pi is an a-orthogonal projection, as 

- P t ) E t w\\l Ml-PjEjjl-PjwWl 

LUi = SUp 2 = 8U P 2 

weW, \\ W \\a weWt \\ W \\a 

= sup Mi-n)W-PM\l = sup \\(i-p)e i{ i-p)M\1 
w m \\PM\l + \\(i-Pi)M\l w m \\(i-Pi)wf a 

- P t ) E lW \\l Ml-jl-P^E^wWl 

= SUp 2 = SU P 2 ' 

we(I-Pi)Wi \\ w \\a we(I-Pi)Wi \\ W \\a 

The last equality in (28) holds because (I — Pi) Ei is a projection and the 
norm of a nontrivial projection in an inner product space depends only on the 
angle between its range and its nullspace [10]. 



4 Adaptive Coarse Degrees of Freedom 

To simplify notation, we formulate the algorithm for the adaptive selection of 
the coarse degrees of freedom for one level at a time and drop the subscript i. 
The basic idea of the method is still the same as in [23, 27, 35]. However, 
the current formulation in terms of the BDDC method, though equivalent and 
written similarly as in [27], is different enough to allow for an explicit treatment 
of the coarse space correction. Therefore, it is suitable for multilevel extension 
which will be introduced later in Section 6. 
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As mentioned before, the space W is constructed using coarse degrees of 
freedom. These can be, e.g., values at corners, and averages over edges or faces. 
The space W is then given by the requirement that the coarse degrees of freedom 
on adjacent substructures coincide; for this reason, the terms coarse degrees of 
freedom and constraints are used interchangeably. The edge (or face) averages 
are necessary in 3D problems to obtain scalability with subdomain size. Ideally, 
one can prove the polylogarithmic condition number bound 

k < const ^1 + log , (29) 

where H is the subdomain size and h is the finite element size. 

Remark 1 The initial selection of constraints in the proposed adaptive ap- 
proach will be done in a way such that (29) is satisfied for problems with suffi- 
ciently regular structure. See, e.g., [15] for a theoretical justification. 

To choose the space W, cf. [23, Section 2.3], suppose we are given a space X 
and a linear operator C : W — > X and define, 

W = {w G W : C {I-E) w = 0} . (30) 

The values Cw will be called local coarse degrees of freedom, and the space W 
consists of all functions w whose local coarse degrees of freedom on adjacent 
substructures have zero jumps. To represent their common values, i.e., the 
global coarse degrees of freedom of vectors u G W, we use a space U c and a 
one-to-one linear operator R c : U c — > X such that 

W = {w G W : 3u c G U c : Cw = R c u c } . 

Observe that (I — E) Pv — for all v G W, so we can define the space W in 
(30) using discrete harmonic functions w G Wr = (I — P) W, for which 

(I — (I — P) E)w = (I — P) (I - E) w. (31) 

Let us denote Wr = (I - P)W = Wn W T - Then the bound (26) in the form of 
the last term in equation (28) can be found, for a fixed level i, as a maximum 
eigenvalue of an associated eigenvalue problem, which can be using (31) written 
as 

a((I-P)(I-E)w,(I-P)(I-E)z) = \a{w,z) Vz G Wr- (32) 

We can then control the condition number bound by adding constraints 
adaptively by taking advantage of the Courant-Fisher-Weyl minimax principle, 
cf., e.g., [6, Theorem 5.2], in the same way as in [23, 27, 35]. 

Corollary 1 ([27]) The generalized eigenvalue problem (32) has eigenvalues 
-^l > ^2 > • ■ • > Ki > 0. Denote the corresponding eigenvectors by we. Then, 
for any k = 1, . . . , n — 1, and any linear functionals Lg, I = 1, . . . , k, 

maxl IK 1- "* 3 ) ^- E ) w Wa :w( zw L f w \ =0 W= l,...,k\ > A fe+1 , 

I Ml J" 
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with equality if 

L t (w) = a ((/ - P) (I - E) w t , (I -P)(I- E) w) . (33) 

Therefore, because (I — E) is a projection, the optimal decrease of the con- 
dition number bound (26) can be achieved by adding to the constraint matrix 
C in the definition of W the rows eg defined by cjw = Lg (id). 

Solving the global eigenvalue problem (32) is expensive, and the vectors eg 
are not of the form required for substructuring, i.e., each eg with nonzero entries 
corresponding to only one corner, an edge or a face at a time. For these reasons, 
we replace (32) by a collection of local problems, each defined by considering 
only two adjacent subdomains s and f2*. Subdomains are called adjacent if 
they share a face. All quantities associated with such pairs will be denoted by 
a superscript st . In particular, we define 

W st =W s xW t^ W st = (/ _ p°t)w st , (34) 

where (I — P st ) realizes the discrete harmonic extension from the local interfaces 
V s and r* to interiors. Thus, functions from Wp* are fully determined by their 
values at the local interfaces V s and T*, and they may be discontinuous at the 
common part T st — V s n T*. 

The bilinear form a st (-,-) is associated on the space Wp with the opera- 
tor S st of Schur complement with respect to the local interfaces, defined by 

S st :Wf^Wf, a st (u,v) = (S st u,v), Vu,v€Wf. (35) 

Operator S 8t is represented by a block-diagonal matrix composed of symmetric 
positive semi-definite matrices 5 s and S* of individual Schur complements of 
the subdomain matrices with respect to local interfaces V s and r , resp., 

(36) 

The action of the local projection operator E st is realized as a (weighted) 
average at T st and as an identity operator at (r s U r')\r st . 

Let C st be the operator defining the initial coarse degrees of freedom that are 
common to both subdomains of the pair. We define the local space of functions 
with the shared coarse degrees of freedom continuous as 

W st = {w € W st : C st (I - E st )w = 0} . (37) 

Finally, we introduce the space Wj? = W st Wp . 

Now the generalized eigenvalue problem (32) becomes a localized problem to 
find w € Wp such that 

a st ((l~P st ) (I ~ E st )w,(l - P st ) (I-E st )z) =\a st (w,z) Vz € Wf?. 

(38) 



s s 
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Assumption 1 The corner constraints are already sufficient to prevent relative 
rigid body motions of any pair of adjacent substructures, so 

Vw € W st : a st {w, w) = => (/ - E st ) w = 0, 

i.e., the corner degrees of freedom are sufficient to constrain the rigid body modes 
of the two substructures into a single set of rigid body modes, which are contin- 
uous across the interface T st . 

The maximal eigenvalue w st of (38) is finite due to Assumption 1, and we 
define the heuristic condition number indicator 

uj = max {uj st : f2 s and f2* are adjacent} . (39) 

Considering two adjacent subdomains fl s and fr only, we get the added 
constraints Li (w) = from (33) as 

a st ((l-P st ) (l~E st )w e , (I-P st ) (I-E st )w) =0 W=l,...,fc, (40) 

where vj£ are the eigenvectors corresponding to the k largest eigenvalues 
from (38). 

Let us denote D the matrix corresponding to C st (I — E st ). We define the 
orthogonal projection onto null I? by 

II = I -D T {DD T y 1 D. 

The generalized eigenvalue problem (32) now becomes 

n (i - p st ) T (i - E st ) T s st (i - E st ) (i - p st ) ilw = xns st n w . (4i) 

Since 

nuiins* st n c nuiin (/ - p st ) T (i - E st ) T s st (i - E st ) (i - p st ) n, (42) 

the eigenvalue problem (41) reduces in the factorspace modulo nulHIS' s *ri to 
a problem with the operator on the right-hand side positive definite. In our 
computations, we have used the subspace iteration method LOBPCG [16] to 
find the dominant eigenvalues and their eigenvectors. The LOBPCG iterations 
then simply run in the factorspace. 

From (41), the constraints to be added are 

L £ (w) = wjn (I - P st ) T (I - E st ) T S st (I - E st ) (I - P st ) Ilw = 0. 

That is, we wish to add to the constraint matrix C the rows 

cf = wJU (I - P st ) T (I - E st f S st (I - E st ) (I - P st ) PL (43) 

Proposition 1 ([27]) The vectors c|', constructed for a domain consisting of 
only two substructures Q s andfl 1 , have matching entries on the interface between 
the two substructures, with opposite signs. 
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That is, each row cf can be split into two blocks and written as 



c f — [ c l 




Either half of each row from the block cf* is then added into the matrices C 8 
and C* corresponding to the subdomains f2 s and f2*. Unfortunately, the added 
rows will generally have nonzero entries over the whole V s and r , including the 
edges in 3D where s and f2* intersect other substructures. Consequently, the 
added rows are not of the form required for substructuring, i.e., each row with 
nonzeros in one edge or face only. In the computations reported in Section 8, 
we drop the adaptively generated edge constraints in 3D. Then it is no longer 
guaranteed that the condition number indicator ui < r. However, the method 
is still observed to perform well. 

The proposed adaptive algorithm follows. 

Algorithm 2 (Adaptive BDDC [23]) Find the smallest k for every two ad- 
jacent substructures fl s and f2* to guarantee that Aj^ +1 < t, where r is a given 
tolerance, and add the constraints (40) to the definition ofW. 



As pointed out already for adaptive 2-level BDDC method in [27], an important 
step for a parallel implementation of the adaptive selection of constraints is an 
efficient solution of the generalized eigenvalue problem (41) for each pair of 
adjacent subdomains. 

There are several aspects of the method immediately making such implemen- 
tation challenging: (i) parallel layout of pairs of subdomains does not follow the 
natural layout of a domain decomposition computation with distribution of data 
based on subdomains, (ii) the multiplication by S st on both sides of equation 
(41) is done only implicitly, since action of S s and S 1 * is available only through 
solution of local discrete Dirichlet problems on subdomains Of and fi*, (iii) 
the process responsible for solving an st-eigenproblem typically does not have 
data for subdomains fif and O*, and thus it has to communicate the vector for 
multiplication to processors able to compute the actions of S s and S*. 

With respect to these issues, it is necessary to use an inverse-free method for 
the solution of each of these problems. In our case, the LOBPCG method [16] is 
applied to find several largest eigenvalues and corresponding eigenvectors we 
solving the homogeneous problem 



A = n (i - p st ) T (i - E st ) T s st (i - E st ) (i - p st ) n, b = us st n, 



and M. a suitable preconditioner. The LOBPCG method requires only multi- 
plications by matrices M, A, and B, and it can run in the factorspace with B 



5 Preconditioned LOBPCG 



M(A-XtB)wt = 0, 



(44) 



with 
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only positive semi-definite. This is important for our situation - although each 
pair of subdomains has enough initial constraints by corners and edge averages 
to avoid mechanisms between the two substructures (enforced by the projection 
II), no essential boundary conditions are applied to the pair as a whole, and 
matrix B of such 'floating' pair typically has nontrivial nullspace (e.g. rigid 
body modes for elasticity problems). 

Initial experiments in [27, 35, 36] revealed that while the unpreconditioned 
LOBPCG {M. = I) works reasonably well for simple problems, it requires pro- 
hibitively many iterations for problems with very irregular substructures and/or 
high jumps in coefficients. Since each iteration requires communicating the vec- 
tor for multiplication, reducing the iteration counts of LOBPCG by precondi- 
tioning is a very sensible way of accelerating the adaptive BDDC method. 

Recall, that the BDDC method provides a preconditioner for the inter- 
face problem of the JBchur complement by the exact solution of the problem 
at the larger space W. As such, components of a BDDC implementation, the 
coarse solver and subdomain corrections, can be used to determine the approx- 
imate action of the Moore-Penrose pseudoinverse of the matrix B, denoted as 
Mbddc ~ (IIS' st n) + . This operator can be used as the preconditioner M 
for problem (44), effectively converting the generalized eigenvalue problem to 
an ordinary eigenproblem inside the iterations. Using notation from (36), the 
preconditioner is formally written as 



71 rloc 

1U BDDC 



n [j o] 



"5 s * 




-1 


I 


C 











s at v) + w n, 



(45) 



where in addition C 



C" 



is the matrix of initial constraints (continuity 



at corners and arithmetic averages on edges), and 



denotes the 



matrix of coarse basis functions for the two subdomains. Here R l c , i = s, t, 
is the zero-one matrix of restriction of the vector of global coarse degrees of 
freedom of the pair to subdomain coarse degrees of freedom. Since some coarse 
degrees of freedom are shared by the two subdomains, corresponding columns 
in \1/ are nonzero in both parts, while columns of coarse degrees of freedom not 
common to the two subdomains are only nonzero in either ^ s R a or 4 ,t i?'. Let 
us recall that in BDDC, the local coarse basis functions are computed as the 
solution to the problem with multiple right hand sides 



(46) 



which represents an independent saddle-point problem with invertible matrix 
for each subdomain, and factorization of which is later reused in applications 
of the preconditioner (45). We also note that the coarse matrix is in the im- 
plementation explicitly computed using the second part of the solution of (46) 
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as 

qTgsty = _ R f^ R s _ R f^ R ^ (47) 

The coarse matrix of the si-pair ty T S sti ff has dimension of the union of the 
coarse degrees of freedom of the subdomains of the pair, and it is typically only 
positive semi-definite for a floating pair. Due to its small dimension, we compute 
its pseudoinverse by means of dense eigenvalue decomposition performed by the 
LAPACK library, 

^Tgsty = VAV t^ ^T S st^+ _ yj^-^v T , (48) 

where diagonal matrix A' arises from A by dropping eigenvalues lower than a 
prescribed tolerance. 

Unlike in the standard BDDC prcconditioncr, no interface averaging is ap- 
plied to the function before and after the action of Mfen C , because problem 
(41) is defined in the space Wf? . Correspondingly, the only approximation is 
due to using A' instead of A. 

6 Adaptive-Multilevel BDDC 

We build on Sections 3 and 4 to propose a new variant of the Multilevel BDDC 
with adaptive selection of constraints on each level. 

The development of adaptive selection of constraints in Multilevel BDDC 
now proceeds similarly as in Section 4. We formulate (26) as a set of eigen- 
value problems for each decomposition level. On each level we solve for every 
two adjacent substructures a generalized eigenvalue problem and we add the 
constraints to the definitions of W,. 

The heuristic condition number indicator is defined as 

uj = n^T^Wi, uJi — max {wf : f2f and f2* are adjacent} . (49) 

We now describe the Adaptive-Multilevel BDDC in more detail. The al- 
gorithm consists of two main steps: (i) set-up (including adaptive selection of 
constraints), and (ii) loop of the preconditioned conjugate gradients (PCG) with 
the Multilevel BDDC from Algorithm 1 as a preconditioner. The set-up can be 
summarized as follows (cf. [36, Algorithm 4] for the 2D case): 

Algorithm 3 (Set-up of Adaptive-Multilevel BDDC) Adding of coarse 
degrees of freedom to guarantee that the condition number indicator u) < r L_1 , 
for a given target value t: 

for levels i = 1 : L — 1, 

Create substructures with roughly the same numbers of degrees of freedom. 

Find a set of initial constraints (in particular sufficient number of corners), 
and set up the BDDC structures for the adaptive algorithm (the next loop 
over faces). 
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for all faces J 7 , on level i, 

Compute the largest local eigenvalues and corresponding eigenvectors, until 
the first m st is found such that A^ st < r. 

Compute the constraint weights and add these rows to the subdomain ma- 
trices of constraints C s and C . 

end. 

Set-up the BDDC structures for level i. 

If the prescribed number of levels is reached, solve the problem directly. 

end. 

7 Implementation remarks 

Serial implementation has been developed in Matlab in the thesis [36]. Parallel 
results use the open-source package BDDCML 1 (version 2.0). This solver is 
written in Fortran 95 programming language and parallelized using MPI library. 
Apart of symmetric positive definite problems studied in this paper, the solver 
also supports symmetric indefinite and general non-symmetric linear systems 
arising from discretizations of PDEs. 

The matrices of the averaging operator E were constructed with entries pro- 
portional to the diagonal entries of the substructure matrices before elimination 
of interiors, which is also known as the stiffness scaling [13]. 

7.1 Initial constraints 

Following Remark 1, in order to satisfy the poly logarithmic condition number 
bounds, we have used corners and arithmetic averages over edges as initial 
constraints. It is essential (Assumption 1) to generate a sufficient number of 
initial constraints to prevent rigid body motions between any pair of adjacent 
substructures. The selection of corners in our parallel implementation follows 
the recent face-based algorithm from [33]. 

7.2 Adaptive constraints 

The adaptive algorithm uses matrices and operators that are readily available 
in an implementation of the BDDC method with an explicit coarse space, with 
one exception: in order to satisfy the local partition of unity, cf. [24, eq. (9)], 

T?St TjSt J 

U i - 1 1 

we need to generate locally the weight matrices Ef to act as an identity operator 
at (r s U r t )\r s * (cf. Section 4). 

1 http : //www .math. cas . cz/~sistek/sof tware/bddcml .html 
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In the computations reported in Section 8, we drop the adaptively generated 
edge constraints in 3D. Then, it is no longer guaranteed that the condition 
number indicator uj < r i_1 . However, the method is still observed to perform 
well. Since the constraint weights are thus supported only on faces, and the 
entries corresponding to edges are set to be zero, we orthogonalize and normalize 
the vectors of constraint weights (by reduced QR decomposition from LAPACK) 
to preserve numerical stability. 

In our experience, preconditioning of the LOBPCG method as described in 
Sec. 5 led to a considerable reduction of the number of LOBPCG iterations. Or 
in other words, since we usually put a limit of maximum 15 iterations for an 
eigenproblem, the resulting eigenvectors are much better converged than with- 
out preconditioning. In the parallel implementation of the adaptive selection 
of constraints, pairs are assigned to processors independently of assignment of 
subdomains. The BDDCML package uses the open-source implementation of 
LOBPCG method [16] available in the BLOPEX package 2 . Details of the paral- 
lel implementation of adaptive selection of constrains were described for 2-level 
BDDC method in detail in [31]. 

7.3 Multilevel implementation 

The BDDCML library allows assignment of multiple subdomains at each pro- 
cess. At each level, subdomains are assigned to available processors, always 
starting from root. Distribution of subdomains on the first level is either pro- 
vided by user's application, or created by the solver using ParMETIS library 
(version 3.2). On higher levels, where the mesh is considerably smaller, METIS 
(version 4.0) [11] is internally used by BDDCML to create mesh partitions. This 
means, that on higher levels, where number of subdomains is lower than number 
of processors, cores with higher ranks are left idle by the preconditioner. 

For solving local discrete Dirichlet and Neumann problems on each subdo- 
main, BDDCML relies on a sequential instance of direct solver MUMPS [1]. A 
parallel instance of MUMPS is also invoked for factorization and repeated solu- 
tion of the final coarse problem at the top level. More details on implementation 
of the (non- adaptive) multilevel approach in BDDCML can be found in [32]. 

8 Numerical Examples 

To study the properties of the Adaptive-Multilevel BDDC method numerically, 
we have selected four problems of structural analysis - two artificial bench- 
mark problems and two realistic engineering problems. Some of the results were 
obtained by our serial implementation written in Matlab and reported in the- 
sis [36]. This implementation is mainly used to study convergence behaviour 
with respect to prescribed tolerance on the condition number indicator r. The 
other set of results is obtained using our newly developed parallel implementa- 
tion within the BDDCML library. Parallel results were obtained on Cray XE6 

2 http : //code . google . com/p/blopex 
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supercomputer Hector at the Edinburgh Parallel Computing Centre. In the 
computations, one step of the Adaptive-Multilevel BDDC method is used as the 
preconditioner in the preconditioned conjugate gradient (PCG) method, which 
is run until the (relative) norm of residual decreases below 10 -8 (in Matlab 
tests) or 1CT 6 (in BDDCML runs). 

8.1 Elasticity in a cube without and with jump in material 
coefficients 

As the first problem, we use the standard benchmark problem of a unit cube. 
In our setting, we solve the elastic response of the cube under loading by its 
own weight, when it is fixed at one vertical edge. There are nine bars cutting 
horizontally through the cube. We test the case when the bars are of the same 
material as the rest of the cube (homogeneous material) and the case when 
Young's modulus of the outer material Ei is 10 6 times smaller than that of 
the bars E2, creating contrast in coefficients E2/E1 = 10 6 . In Fig. 1 (right), 
the (magnified) deformed shape of the cube is shown for this jump in Young's 
modulus. We have recently presented a detailed study of behaviour of the 
standard (2-level) BDDC method and its adaptive extension with respect to 
contrast on the same problem in [31]. It was shown in that reference, that while 
convergence of BDDC with the standard choice of arithmetic averages on faces 
quickly deteriorates with increasing contrast, adaptive version of the algorithm 
is capable of maintaining good convergence also for large values of contrast, at 
the cost of quite expensive set-up phase. 

The multilevel approach (without adaptivity), although it may lead to faster 
solution, suffers from exponentially growing condition number and related num- 
ber of iterations, as reported in [26], or recently in [32]. Here, we investigate 
the effect of constraints adaptively generated at higher levels in the multilevel 
algorithm. We also study the parallel performance of our solver on this test 
problem. 

The cube is discretized using uniform mesh of tri-linear finite elements and 
divided into an increasing number of subdomains. On the first level, subdomains 
are cubic with constant H/h = 16 ratio (see Fig. 1 left for an example of a 
division into 64 subdomains). On higher levels, divisions into subdomains are 
created automatically inside BDDCML by the METIS package, in general not 
preserving cubic shape of subdomains. 

In Tabs. 1 and 2, we present results of a weak scaling test for the case of 
the homogeneous cube, i.e. EijE\ = 1. This problem is very well suited for 
the BDDC method, and the performance is generally very good. The growing 
problem is solved on 8 to 32768 processors (with each core handling one subdo- 
main of the first level). In these tables, N denotes the number of subdomains 
(and computer cores), n denotes global problem size, n r represents the size of 
the reduced problem defined at the interface T, n/ is the number of faces in 
divisions on the levels (corresponding to number of generalized eigenproblems 
solved in the adaptive approach), 'its.' is the number of iterations needed by 
the PCG method, and 'cond.' is the estimated condition number obtained from 
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Z-displacement 




l-6e-6 
-6.688e-6 



E -4e-6 



Figure 1: Example of a division of the cube into 64 subdomains (left) and 
(magnified) deformed shape for contrast E 2 /Ei — 10 6 coloured by vertical dis- 
placement (right). 



the tridiagonal matrix generated in PCG. We report times needed by the set-up 
phase ('set-up'), by PCG iterations ('PCG') and their sum ('solve'). 

In Tab. 1, no adaptivity is used, and only the number of levels is varying. 
We can see, that for the standard (2-level) BDDC, we obtain the well-known 
independence of number of iterations on problem size. We can also see, how 
condition number (and number of PCG iterations) grows when using more lev- 
els. Although this can lead to savings in time in certain circumstances (due-to 
cheaper set-up), no such benefits are seen here and these are more common in 
tests of strong scaling with fixed problem size [32] . 

The independence on problem size is slightly biased on higher levels, prob- 
ably due to the irregular subdomains. Computational times slightly grow with 
problem size, suggesting sub-optimal scaling of BDDCML, especially when go- 
ing from 512 to 4096 computing cores. For the largest problem of 32x32x32 
subdomains with 405 million degrees of freedom solved on 32768 cores, all times 
grow considerably. This is most likely due to the higher cost of global com- 
munication functions at this core count, and these results will serve for future 
performance analysis and optimization of the BDDCML solver. Note, that in 
the case of two levels, parallel direct solver MUMPS failed to solve the result- 
ing coarse problem at this level of parallelism, which is marked by 'n/a' in the 
tables. 

We are now interested in the effect of adaptively generated constraints on 
convergence of the multilevel BDDC method. Based on recommendations from 
[31], we limit number of LOBPCG iterations to 15 and maximal number of 
computed eigenvectors to 10 to maintain the cost of LOBPCG solution low. 
The target condition number limit is set low, t = 1.5, which leads to using 
most of the adaptively generated constraints in actual computation. Results 
are reported in Tab. 2. We can see, that the adaptive approach is capable 
of keeping the iteration counts lower, and although the independence of the 
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number of levels is not achieved, the growth is slower than for the non-adaptive 
approach. While the scalability of the solver is similar to the non-adaptive case, 
it is not surprising that the computational time is now dominated by the solution 
of the generalized eigenvalue problems. This fact makes the adaptive method 
unsuitable for simple problems like this one, in agreement with conclusions for 
the 2-level BDDC in [31]. 

The situation changes however, when some numerical difficulties appear in 
the problem of interest. One source of such difficulties may be presented by 
jumps in material coefficients. To model this effect, we increased the jump 
between Young's moduli of the stiff rods and soft outer material to E 2 /Ei = 10 6 , 
and these results are reported in Tabs. 3 and 4. For the non-adaptive method 
(Tab. 3) we can see growth of number of iterations and condition number not 
only with adding levels, but also for growing problem size. This growth is 
translated to large time spent in PCG iterations, which now dominate the whole 
solution. 

Results are very different for the adaptive approach in Table 4, for which the 
main cost is still presented by the solution of the related eigenproblems (included 
into time of 'set-up'). Since we keep the number of computed eigenvectors con- 
stant (ten) for each pair of subdomains, the method is not able to maintain a 
low condition number after all these eigenvectors are used for generating con- 
straints. However, number of iterations is always significantly lower than in the 
non-adaptive approach, and the method typically requires about one half of the 
computational time. While this is an important saving of computational time, 
it is also shown in [31], that the adaptive approach can solve even problems with 
contrasts such high, that they are not solvable by the non-adaptive approach 
with arithmetic averages on all faces and edges. 

Finally, we compare properties of the coarse basis functions on the first 
and the second level on this problem. We consider homogeneous material of 
the cube which is divided into regular cubic subdomains both on the first and 
the second (unlike in the previous test) level. Namely, the cube is divided 
into 4x4x4=64 subdomains on the second level. Each of these subdomains is 
composed again of 4x4x4=64 subdomains of the first level, which gives 4096 
subdomains. Each of these first-level subdomains is composed of 4x4x4=64 
tri-linear finite elements. The problem has in total 262144 elements and 823872 
unknowns. Table 5 summarizes results of the adaptive 3-level BDDC method 
for different values of prescribed tolerance r. For comparison, the non-adaptive 
3-level BDDC method with three arithmetic averages on each face requires 19 
PCG iterations and the resulting estimated condition number is 6.88. 

We can see, that significantly (roughly five times) more constraints are se- 
lected on the second level than on the first one, which suggests that the discrete 
harmonic basis functions of the first level lead to worse conditioned coarse prob- 
lem on the second level. Thus, it underlines the importance of adaptive selection 
of constraints on higher levels. For r 2 = 2.25, the maximal number of adap- 
tive constraints (ten) is used on each pair, and the algorithm is 'saturated'. 
Consequently, more constraints would be necessary on each pair to satisfy the 
condition u> < t 2 from Algorithm 3. 
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N 

£ = l(/2/3) 


n f 

n nr 1= l(/2/3) 


its. cond. 


time (sec) 
set-up PCG solve 


2 levels 


8 

64 
512 
4096 


O.IM 9.5k 12 
0.8M O.IM 0.1k 
6.4M l.OM 1.3k 
50.9M 8.4M 11.5k 


15 6.7 

19 7.3 

20 6.8 

n/a n/a 


3.9 1.6 5.5 
4.6 2.1 6.7 
9.4 3.2 12.6 

n/a n/a n/a 


3 levels 


64/8 
512/64 
4096/512 
32768/128 


0.8M O.IM 0.1k/18 
6.4M l.OM 1.3k/295 
50.9M 8.4M 11.5k/2930 
405.0M 69. 1M 95.2k/664 


23 9.6 

30 16.9 

31 13.2 
36 24.7 


4.5 2.4 7.0 
5.7 3.6 9.3 
19.0 7.3 26.3 
165.8 20.0 185.7 


4 levels 


512/64/8 
4096/512/64 
32768/512/8 


6.4M l.OM 1.3k/295/23 
50.9M 8.4M 11.5k/2930/380 
405.0M 69.1M 95.2k/2921/23 


41 24.5 
64 87.7 
45 33.0 


5.5 4.8 10.4 
9.2 11.5 20.8 
156.5 24.7 181.2 



Table 1: Weak scaling for the cube problem with homogeneous material, non- 
adaptive multilevel BDDC. 



N 

£=l(/2/3) 


n 


n r 


n f 

£=l(/2/3) 


its. 


cond. 


time (sec 
set-up PCG 


) 

solve 


2 levels 


8 


0.1M 


9.5k 


12 


11 


2.5 


56.1 


1.2 


57.3 


64 


0.8M 


0.1M 


0.1k 


13 


3.1 


119.3 


1.5 


120.9 


512 


6.4M 


l.OM 


1.3k 


14 


3.1 


160.8 


2.4 


163.3 


4096 


50.9M 


8.4M 


11.5k 


n/a 


n/a 


n/a 


n/a 


n/a 


3 levels 


64/8 


0.8M 


O.IM 


0.1k/18 


14 


3.3 


121.0 


1.6 


122.7 


512/64 


6.4M 


l.OM 


1.3k/295 


17 


4.2 


166.9 


2.4 


169.3 


4096/512 


50.9M 


8.4M 


11.5k/2930 


18 


4.4 


221.7 


5.5 


227.3 


32768/128 


405. 0M 


69.1M 


95.2k/664 


20 


4.8 


940.3 


23.6 


963.9 


4 levels 


512/64/8 
4096/512/64 
32768/512/8 


6.4M 
50.9M 
405.0M 


l.OM 
8.4M 
69. 1M 


1.3k/295/23 
11.5k/2930/380 
95.2k/2921/23 


22 
31 
30 


6.9 
12.2 
10.6 


175.3 
289.5 
723.1 


3.1 
7.9 
40.9 


178.4 
297.5 
764.0 



Table 2: Weak scaling for the cube problem with homogeneous material, adap- 
tive multilevel BDDC. 
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N 

£ = l(/2/3) 


n 


n F 


n f 

£ = l(/2/3) 


its. 


cond. 


set-up 


time (sec) 
PCG 


solve 


2 levels 


8 


0.1M 


9.5k 


12 


582 


236k 


4.0 


59.4 


63.4 


64 


0.8M 


0.1M 


0.1k 


1611 


233k 


4.7 


171.9 


176.6 


512 


6.4M 


l.OM 


1.3k 


2195 


240k 


9.5 


340.4 


350.0 


4096 


50.9M 


8.4M 


11.5k 


n/a 


n/a 


n/a 


n/a 


n/a 


3 levels 


64/8 


0.8M 


0.1M 


0.1k/18 


2218 


239k 


4.7 


234.1 


238.8 


512/64 


6.4M 


l.OM 


1.3k/295 


2830 


250k 


5.5 


328.2 


333.7 


4096/512 


50.9M 


8.4M 


11.5k/2930 


4636 


587k 


19.3 


1096.2 


1115.5 


32768/128 


405.0M 


69. 1M 


95.2k/664 


6914 


737k 


155.0 


3820.8 


3975.8 


4 levels 


512/64/8 


6.4M 


l.OM 


1.3k/295/23 


3771 


729k 


5.4 


434.4 


439.8 


4096/512/64 


50.9M 


8.4M 


11.5k/2930/380 


8548 


1860k 


9.3 


1502.3 


1511.6 


32768/512/8 


405.0M 


69.1M 


95.2k/2921/23 


9532 


2362k 


160.2 


5096.6 


5256.8 



Table 3: Weak scaling for the cube problem with jump in coefficients EijE\ — 
10 , non-adaptive multilevel BDDC. 
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£ = l(/2/3) 


n 




n f 

£ = l(/2/3) 


its. 


cond. 


set-up 


time (sec) 
PCG 


solve 


2 levels 


8 


0.1M 


9.5k 


12 


119 


1951 


34.1 


12.3 


46.5 


64 


0.8M 


0.1M 


0.1k 


76 


102 


96.0 


8.1 


104.1 


512 


6.4M 


l.OM 


1.3k 


58 


55 


164.2 


8.9 


173.2 


4096 


50.9M 


8.4M 


11.5k 


n/a 


n/a 


n/a 


n/a 


n/a 


3 levels 


64/8 


0.8M 


0.1M 


0.1k/18 


457 


48k 


96.7 


48.0 


144.7 


512/64 


6.4M 


l.OM 


1.3k/295 


82 


0.1k 


165.7 


10.2 


175.9 


4096/512 


50.9M 


8.4M 


11.5k/2930 


282 


165k 


238.7 


74.1 


312.9 


32768/128 


405.0M 


69. 1M 


95.2k/664 


270 


24k 


909.4 


297.6 


1207.0 


4 levels 


512/64/8 


6.4M 


l.OM 


1.3k/295/23 


554 


63k 


169.5 


68.3 


273.7 


4096/512/64 


50.9M 


8.4M 


11.5k/2930/380 


3392 


671k 


299.3 


800.1 


1099.4 


32768/512/8 


405.0M 


69. 1M 


95.2k/2921/23 


3762 


10495k 


697.6 


4925.1 


5622.7 



Table 4: Weak scaling for the cube problem with jump in coefficients E 2 /Ei = 
10 6 , adaptive multilevel BDDC. 
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r 2 


1st level (11 520 pairs) 
ad. cstrs. cstrs./pair ui\ 


2nd level (144 pairs) 
ad. cstrs. cstrs./pair 


Ll> 2 




its. 


cond. 


25.0 


120 


0.01 


4.45 


84 


0.58 


4.37 


19.53 


21 


7.18 


16.0 


2220 


0.19 


2.70 


132 


0.92 


3.33 


8.99 


18 


4.77 


9.0 


2220 


0.19 


2.70 


228 


1.58 


2.92 


7.89 


18 


4.72 


4.0 


15 660 


1.36 


1.99 


1116 


7.75 


1.98 


3.93 


13 


2.77 


2.25 


69 960 


6.07 


1.42 


1440 


10.00 


2.49 


3.55 


14 


3.25 



Table 5: Comparison of adaptively selected constraints for different target 
condition number r 2 ; 'ad. cstrs.' - number of added adaptive constraints, 
'cstrs./pair' - average number of constraints added for one pair, uj = ui\ui2 ~ the 
condition number indicator from (49); adaptive 3-level BDDC. 

8.2 Elasticity in a cube with variable size of regions of 
jumps in coefficients 

The performance of the Adaptive- Multilevel BDDC method in the presence of 
jumps in material coefficients has been tested on a cube designed similarly as 
the problem above, with material properties E\ = 10 6 , v\ = 0.45, and E 2 — 
2.1 • 10 11 , v 2 = 0.3. However, the stiff bars now vary in size, and while the thin 
bars create numerical difficulties on the first level, the large bar creates a jump 
in the decomposition on the second level, see Fig. 2. The computational mesh 
consists of 823k degrees of freedom and it is distributed into 512 substructures 
with 1344 faces on the first decomposition level, and into 4 substructures with 
4 faces on the second decomposition level (see Fig. 2). 

First, we present results by our serial implementation in Matlab, published 
initially in the thesis [36]. We include them here along the parallel results to 
make this study of Adaptive- Multilevel BDDC more self-contained. Comparing 
the results in Tabs. 6 and 7 we see that a relatively small number of (additional) 
constraints leads to a considerable decrease in number of iterations of the 2- 
level method. In these tables, Nc denotes number of constraints, 'c','c+e', 
'c+e+f denote combinations of constraints at corners, and arithmetic averages 
at edges and faces, '3eigv' corresponds to using three adaptive constraints on 
faces instead of the three arithmetic averages, r denotes the target condition 
number from Algorithm 3, ui is the indicator of the condition number from (49), 
'cond.' denotes estimated condition number, and 'its.' the number of PCG 
iterations. 

When the non-adaptive 2-level is replaced by the 3-level method (Tabs. 6 and 
8), the condition number estimate as well as the number of iterations grow, in 
agreement with the estimate (25). However, with the adaptive 3-level approach 
(Tab. 9) we were able to achieve nearly the same convergence properties for 
small r as in the adaptive 2-level method (Tab. 7). 

Next, we use this test problem to perform a strong scaling test of our par- 
allel implementation of adaptive multilevel BDDC method. Since BDDCML 
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Figure 2: Cube with variable size of regions of jumps in coefficients: distribu- 
tion of material with stiff bars (E2 =2.1- 10 11 , = 0.3) , and soft outer material 
(Ei = 10 6 , v\ = 0.45) (left), mesh consisting of 823k degrees of freedom dis- 
tributed into 512 substructures with 1344 faces on the first decomposition level 
(centre), and 4 substructures with 4 faces on the second decomposition level 
(right). Reproduced from [36]. 



constraint 


Nc cond. its. 


c 
c+e 
e+e+f 
c+e+f (3eigv) 


2163 312 371 > 3 000 
5 691 45 849 1521 
9 723 16 384 916 
9 723 3 848 367 



Table 6: Results for the cube with variable size of regions of jumps in coefficients 
(Fig. 2) obtained using the non-adaptive 2-level BDDC method. Reproduced 
from [36]. 



T 


Nc 




cond. 


its. 


oo(=c+e) 


5 691 


O(10 4 ) 


45 848.60 


1521 


10 000 


5 883 


8 776.50 


5 098.60 


441 


1000 


6 027 


5.33 


9.92 


32 


10 


6149 


6.25 


6.66 


28 


5 


9119 


< 5 


4.79 


24 


2 


25 009 


< 2 


2.92 


18 



Table 7: Results for the cube with variable size of regions of jumps in coefficients 
(Fig. 2) obtained using the adaptive 2-level BDDC method. Reproduced from 
[36]. 
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constraint 


Nc cond. its. 


c 
c+e 
c+e+f 
c+e+f (3eigv) 


2163/18 O(10 7 ) > 3000 
5691/21 O(10 6 ) > 3 000 
9723/33 461750 1573 
9723/33 125 305 981 



Table 8: Results for the cube with variable size of regions of jumps in coefficients 
(Fig. 2) obtained using the non-adaptive 3-level BDDC method. Reproduced 
from [36]. 



T 2 


Nc 




cond. 


its. 


oo(=c+e) 


5691 + 21 




O(10 b ) 


> 3000 


10 000 


5883/28 


8776.50 


26 874.40 


812 


1000 


6027/34 


766.82 


1449.50 


145 


100 


6027/53 


99.05 


100.89 


59 


10 


6149/65 


7.93 


7.91 


30 


5 


9119/67 


< 5 


6.18 


25 


2 


25 009/122 


< 2 


3.08 


18 



Table 9: Results for the cube with variable size of regions of jumps in coefficients 
(Fig. 2) obtained using the adaptive 3-level BDDC method. Reproduced from 
[36]. 
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supports assigning several subdomains to each processor, the division is kept 
constant with 512 subdomains on the basic level, and 4 subdomains on the 
second level (as in Fig. 2), and number of cores is varied. 

Figure 3 presents parallel computational time and speed-up when this prob- 
lem is solved by the parallel 2-level BDDC method, comparing efficiency of 
the non-adaptive and adaptive solver. We report times and speed-ups indepen- 
dently for the set-up phase (including solution of eigenproblems for adaptive 
method), the phase of PCG iterations, and their sum ('solve'). Figure 4 then 
presents parallel computational time and speed-up for S-level BDDC method. 

We can see, that both phases of the solution are reasonably scalable. For 
large core counts, scalability worsens, as each core has only little work with 
subdomain problems and (the less scalable) solution of the coarse problem dom- 
inates the computation. It is worth noting, that the overall fastest solution was 
delivered by the adaptive 2-level BDDC method on 512 cores, while both other 
extensions of BDDC - non-adaptive 3-level BDDC and adaptive 3-level BDDC 
- were also considerably faster than the standard (non-adaptive 2-level) BDDC 
method on large number of cores. 
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set-up (adaptive) 
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Figure 3: Strong scaling test for the cube with variable size of regions of 
jumps in coefficients (Fig. 2) containing 823k degrees of freedom, on the first 
level divided into 512 subdomains with 1344 faces with arithmetic/adaptive 
constraints. Computational time (left) and speed-up (right) separately for set- 
up and PCG phases, and their sum ('solve'), comparison of non-adaptive (680 
its.) and adaptive (85 its.) parallel 2-level BDDC. 



8.3 Linear elasticity analysis of a mining reel 

The performance of the Adaptive- Multilevel BDDC has been tested on an engi- 
neering problem of linear elasticity analysis of a mining reel. The problem was 
provided for testing by Jan Lestina and Jaroslav Novotny. The computational 
mesh consists of 141k quadratic finite elements, 579k nodes, and approximately 
1.7M degrees of freedom. It was divided into 1024 subdomains with 3893 faces 
(see Fig. 5). 
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Figure 4: Strong scaling test for the cube with variable size of regions of jumps 
in coefficients (Fig. 2) containing 823k degrees of freedom, on the first level di- 
vided into 5f2 subdomains with 1344 faces with arithmetic/adaptive constraints, 
and on the second level into 4 subdomains with 4 faces. Computational time 
(left) and speed-up (right) separately for set-up and PCG phases, and their sum 
('solve'), comparison of non-adaptive (894 its.) and adaptive (150 its.) parallel 
3-level BDDC. 



The problem presents a very challenging application for iterative solvers due 
to its very complicated geometry. It contains a steel rope, which is not modelled 
as a contact problem but just by a complicated mesh with elements connected 
through edges of three-dimensional elements (Fig. 6). Its automatic partitioning 
by METIS creates further problems such as thin elongated subdomains, discon- 
nected subdomains, or subdomains with insufficiently coupled elements leading 
to 'spurious mechanisms' inside subdomains. See Fig. 6 for examples. 

We first perform a series of computations by our serial implementation in 
Matlab to study the effect of prescribed target condition number r on con- 
vergence. Comparing results by non-adaptive 2-level BDDC (Tab. 10) with 
adaptive 2-level BDDC (Tab. 11), we see that the adaptive approach allows for 
a significant improvement in the number of iterations. 

We can also see, that convergence of the adaptive two- and three-level 
method (Tables 11 and 12) is nearly identical. For the three-level method, 
automatic division into 32 subdomains was used on the second level. 

We note that the observed approximate condition number computed from 
the Lanczos sequence in PCG ('cond.') is larger than the target condition 
number r for this problem. In [23, 36], it was shown that these two numbers 
match remarkably well for simpler problems, especially in 2D. Despite of this 
difference, the algorithm still performs very well. 

Next, we solved this problem with the parallel implementation of the algo- 
rithm. In Fig. 7, we present results of a strong scaling test. 

The non-adaptive 2-level BDDC method requires 610 PCG iterations, while 
the adaptive 2-level BDDC needs only 200 PCG iterations. Nevertheless, this 
difference is only able to compensate the cost of solving the eigenproblems, and 
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Figure 5: Finite element discretization and substructuring of the mining reel 
problem, consisting of 1.7M degrees of freedom, divided into 1024 subdomains 
with 3893 faces. Data by courtesy of Jan Lestina and Jaroslav Novotny. Re- 
produced from [36]. 



constraint 


Nc 


cond. 


its. 


c+e 


27183 




> 2 000 


c+c+f 


38 868 


1.18 • 10 6 


1303 


c+e+f (3eigv) 


38 868 


72 704.80 


674 



Table 10: Convergence of the non- adaptive 2-level BDDC method with different 
constraints, mining reel problem. Reproduced from [36]. 



r 


Nc 




cond. 


its. 


oo(=c+e) 


27183 


1.76 ■ 10 e 




> 2 000 


10000 


28 023 


9 992.61 


9538.18 


910 


5000 


28 727 


4934.62 


4849.75 


673 


1000 


32 460 


999.90 


2179.79 


391 


500 


35 017 


499.64 


1277.59 


318 


100 


42 849 


99.89 


840.74 


213 


50 


46 093 


49.98 


784.49 


194 


10 


59 496 


< 10 


321.20 


129 


5 


69 249 


< 5 


198.68 


91 


2 


92 467 


< 2 


91.24 


72 



Table 11: Convergence of the adaptive 2-level BDDC method with variable 
target condition number parameter r, mining reel problem. Lower r corresponds 
to more constraints and better convergence. Reproduced from [36]. 
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Figure 6: Examples of difficulties with computational mesh and its partitioning 
for the mining reel problem: (i) rope modelled as axisymmetric rings connected 
at edges of three-dimensional elements (top left), (ii) disconnected subdomains 
(top right), (iii) elongated substructures (bottom left), (iv) spurious mechanisms 
within subdomains, such as elements connected to rest of the subdomain only 
at single node (bottom right). Reproduced from [36]. 







Nc 




cond. 


its. 


100 


42 849 - 


-2 378 


99.89 


3 567.02 


382 


10 


59 496- 


-6 419 


< 10 


320.82 


139 


5 


69 249 - 


-8 681 


< 5 


198.55 


98 



Table 12: Convergence of the adaptive 3-level BDDC method with variable 
target condition number parameter r, mining reel problem. Reproduced from 
[36]. 
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the adaptive method is advantageous with respect to computing time only for 
1024 cores. 

We can see, that the scaling is nearly optimal, with the deviation caused 
probably again by the small size of the problem compared to the core counts 
used in this experiment. In the parallel case, the 3-level approach did not work 
well neither with nor without adaptive selection of constraints, requiring more 
than 5000 PCG iterations in both cases. The slow convergence of the adaptive 3- 
level method is probably caused by the limit of ten adaptive constraints per face, 
which seems to be insufficient for this difficult problem - in Matlab experiments, 
as many as 8681 adaptive constraints were generated among 32 subdomains on 
the second level in order to satisfy uj < r 2 = 5 (Tab. 12). 
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Figure 7: Strong scaling test for the mining reel problem containing 1.7M 
degrees of freedom and on the first level divided into 1024 subdomains with 
3893 faces with arithmetic/adaptive constraints, computational time (left) and 
speed-up (right) separately for set-up and PCG phases, and their sum ('solve'), 
comparison of non-adaptive (610 its.) and adaptive (200 its.) parallel 2-level 
BDDC. 



8.4 Linear elasticity analysis of a geocomposite sample 

Finally, the algorithm is applied to a problem of elasticity analysis of a cubic 
geocomposite sample. The sample was analyzed in [2] and provided by the 
authors for testing of our implementation. The length of the edge of the cube is 
75 mm, and the cube is composed of five distinct materials identified by means 
of computer tomography. 

Different material properties cause anisotropic response of the cube even for 
simple axial stretching in z direction (Fig. 8 right). The problem is discretized 
using unstructured grid of about 12 million linear tetrahedral elements, resulting 
in approximately 6 million degrees of freedom. The mesh was divided into 1024 
subdomains on the first and into 32 subdomains on the second level, resulting 
in 5635 and 100 faces, respectively. 

Table 13 summarizes the number of iterations and estimated condition num- 
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Figure 8: Elasticity in a geocomposite sample; Young's modulus due to different 
materials (left), magnitude of displacement on slices (right). Mesh contains 12 
million linear tetrahedral elements and approx. 6 million degrees of freedom. 
Data by courtesy of Radim Blaheta and Jin Stary. 

ber of the preconditioned operator for the combinations of 2- and 3-level BDDC 
with non-adaptive and adaptive selection of constraints. For this problem, num- 
ber of iterations was reduced to approximately one half by using adaptivity. We 
can also see, that number of iterations grows considerably when going from 2 
to 3 levels for this problem. 



levels 


N 

£=l(/2) 


n f 

Ur £ = l(/2) 


non-adaptive 
its. cond. 


adaptive 
its. cond. 


2 
3 


1024 
1024/32 


6.1M 1.3M 5635 
6.1M 1.3M 5635/100 


65 94.8 
214 2724.9 


36 37.7 
113 1792.2 



Table 13: Number of iterations and condition number estimate for the geo- 
composite problem, 6 million degrees of freedom. Comparison for two and three 
levels for non-adaptive and adaptive BDDC. 

In Figs. 9 and 10, we report strong scaling of the parallel implementation on 
this problem for 2-level and 3-level BDDC, respectively. The strong scaling is 
again nearly optimal. 

We can again see that while for the non-adaptive approach, most time is 
spent in the PCG iterations, for the adaptive approach, the curve for the set-up 
phase is almost indistinguishable from the one for total solution time, and the 
set-up clearly dominates the solution. For the two-level method, MUMPS was 
not able to solve the coarse problem on 1024 cores and so this value is omitted 
in Fig. 9. 
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Figure 9: Strong scaling test for the geocomposite problem (Fig. 8) contain- 
ing approx. 6 million degrees of freedom, on the first level divided into 1024 
subdomains with 5635 faces with arithmetic/adaptive constraints. Computa- 
tional time (left) and speed-up (right) separately for set-up and PCG phases, 
and their sum ('solve'), comparison of non-adaptive (65 its.) and adaptive (36 
its.) parallel 2-level BDDC. 
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Figure 10: Strong scaling test for the geocomposite problem (Fig. 8) contain- 
ing approx. 6 million degrees of freedom, on the first level divided into 1024 
subdomains with 5635 faces with arithmetic/adaptive constraints, and on the 
second level into 32 subdomains with 100 faces. Computational time (left) and 
speed-up (right) separately for set-up and PCG phases, and their sum ('solve'), 
comparison of non-adaptive (214 its.) and adaptive (113 its.) parallel 3-level 
BDDC. 
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9 Conclusion 



We have presented the algorithm of Adaptive-Multilevel BDDC method for 
three-dimensional problems, and its parallel implementation. The algorithm 
represents the concluding step of combining ideas developed separately for adap- 
tive selection of constraints in BDDC [23, 27, 31, 35], and for the multilevel 
extension of the BDDC method [25, 26, 32]. The algorithm and its serial im- 
plementation was studied in the thesis [36], and the serial algorithm for two 
dimensional problems also in [39]. 

The Adaptive BDDC method aims at numerically difficult problems, like 
those containing severe jumps in material coefficients within the computational 
domain. It recognizes troublesome parts of the interface by solving a gener- 
alized eigenvalue problem for each pair of adjacent subdomains which share a 
face. By dominant eigenvalues the method detects where constraints need to 
be concentrated in order to improve the coarse space, thus reducing number of 
iterations. On the other hand, the Multilevel BDDC aims at improving scal- 
ability of the BDDC method for very large numbers of subdomains, for which 
the coarse problem gets too large and/or fragmented to be solved by a parallel 
direct solver. However, as theory suggests and experiments confirm, Multilevel 
BDDC leads to an exponential growth of the condition number and the number 
of iterations. 

The Adaptive- Multilevel BDDC method provides a kind of synergy of the 
adaptive and the multilevel approaches. Our results confirm, that adaptively 
generated constraints are capable of reducing the rate of growth of the condition 
number with levels. At the same time, the extension to three levels improved 
scalability of the adaptive 2-level approach, and for large problems and core 
counts, we have been able to obtain results we could not get by the 2-level 
method. 

A convenient way of preconditioning the LOBPCG method based on compo- 
nents of BDDC was presented, effectively converting the generalized eigenvalue 
problems to ordinary eigenproblems. In our computations, this preconditioning 
led to large savings of number of LOBPCG iterations and corresponding com- 
puting time. However, to reduce the time necessary for solving the eigenprob- 
lems further, we have restricted the maximal number of adaptively generated 
constraints per face to ten in our parallel computations. For this reason, the 
resulting performance of the adaptive method was not as optimal as the serial 
tests (with arbitrary number of constraints per face) suggested. 

We have described a parallel implementation of the algorithm available in 
our open-source library BDDCML. The solver has been successfully applied to 
systems of equations with over 400 million unknowns solved on 32 thousand 
cores. Presented results confirm, that both adaptive and non-adaptive imple- 
mentations are reasonably scalable. However, presented computations have also 
revealed sub-optimal scaling especially on large numbers of cores, and these re- 
sults will provide a basis for further optimization of the solver. 

We have presented results of two benchmark and two engineering problems 
of structural analysis. On all problems, the adaptive selection of constraints led 
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to reduced number of PCG iterations. However, for most problems, this fact did 
not lead to savings in computational time, and the cost of generating adaptive 
constraints was not compensated by saved iterations. 

It can be concluded that the adaptive method is not suitable for simple 
problems, where also non-adaptive (even multilevel) method would converge 
reasonably fast. For problems with difficulties, the non-adaptive BDDC method 
leads to large cost of PCG iterations, especially for using several levels. On the 
contrary, the set-up phase with solution of local eigenproblems mostly dominates 
the overall solution time for the Adaptive- Multilevel BDDC method. Which 
approach is finally advantageous depends on a particular problem. Apart of the 
aspect of computational time, we have already encountered several problems for 
which the non-adaptive BDDC method failed, and which have been successfully 
solved by adaptive BDDC. 
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